---
name: literature-experiment-extract
description: Extract experimental models, experimental methods, and biomarker information from paper Markdown (typically produced by PDF-to-Markdown tools) when a user provides paper Markdown and needs a structured, evidence-backed summary (1 Markdown + 3 CSVs).
license: MIT
author: aipoch
---
> **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills)

## When to Use

- You have a paper converted to Markdown (e.g., via PDF-to-Markdown) and need to extract **cell/animal models** used in experiments.
- You need a structured list of **experimental methods/protocols** described in the paper, with traceable evidence.
- You want to compile **biomarkers / detection indicators** (e.g., genes, proteins, assays, readouts) reported in the study.
- You need standardized outputs for downstream analysis: **one Markdown summary plus three CSV tables**.
- The paper Markdown includes page markers (e.g., `## Page XX`) and you want evidence organized **by page**.

## Key Features

- Extracts three entity groups from paper Markdown:
  - **Experimental models** (cell lines, animal models, strains, genotypes, etc.)
  - **Experimental methods** (assays, protocols, instruments, conditions)
  - **Biomarkers / indicators** (targets, readouts, measured variables)
- Produces **evidence-backed** results (citations/excerpts preserved and traceable to the source).
- Supports **page-aware evidence organization** when the input includes pagination headers like `## Page XX`.
- Outputs are fixed and standardized:
  - **1 Markdown summary**
  - **3 CSV files**: models / methods / biomarkers
- Uses a predefined template and extraction rules:
  - Requirements and consistency rules: `references/guide.md`
  - Output template: `assets/template.md`

## Dependencies

- None (documentation-driven workflow).
- Input assumption: paper content is available as **Markdown**, typically generated by a **PDF-to-Markdown** tool.

## Example Usage

### Input

A paper converted to Markdown, ideally with page headers:

```md
## Page 1
... text describing "C57BL/6 mice" and "Western blot" ...

## Page 2
... text describing "ELISA" and "IL-6 levels" ...
```

### Steps

1. Open the paper Markdown (typically produced by PDF-to-Markdown tools).
2. Extract **models**, **methods**, and **biomarkers** page by page.
3. Follow:
   - Extraction rules and evidence requirements: `references/guide.md`
   - Output template: `assets/template.md`
4. Output **exactly**:
   - `outputs/{Paper Abbreviation}-experiment-summary.md`
   - `outputs/{Paper Abbreviation}-models.csv`
   - `outputs/{Paper Abbreviation}-methods.csv`
   - `outputs/{Paper Abbreviation}-biomarkers.csv`

### Output (required)

- All final outputs must be **UTF-8** encoded.
- Output must be produced **directly** (no confirmation steps or optional branches).
- Evidence excerpts must remain in the **original language** of the source literature.

## Implementation Details

- **Input parsing**
  - Read the paper Markdown as the sole input source.
  - If pagination headers like `## Page XX` exist, prioritize attaching evidence to the corresponding page.

- **Extraction rules**
  - Apply entity definitions, allowed/expected fields, normalization rules, and evidence formatting as specified in `references/guide.md`.

- **Output formatting**
  - Generate outputs using `assets/template.md` as the canonical structure.
  - Add rows as needed while preserving evidence citations/excerpts.
  - The output set is fixed: **1 Markdown summary + 3 CSVs** (models/methods/biomarkers).

- **Paths and naming**
  - Default output directory: `outputs/`
  - Naming:
    - Markdown: `outputs/{Paper Abbreviation}-experiment-summary.md`
    - CSVs:
      - `outputs/{Paper Abbreviation}-models.csv`
      - `outputs/{Paper Abbreviation}-methods.csv`
      - `outputs/{Paper Abbreviation}-biomarkers.csv`

- **Language**
  - Output language should be **Chinese by default** (or the user-requested language if specified).
  - Evidence excerpts must remain in the **original language** of the source text.